Goto

Collaborating Authors

 mode estimation


Optimal Self-Consistency for Efficient Reasoning with Large Language Models

Feng, Austin, Alonso, Marius, Odonnat, Ambroise

arXiv.org Machine Learning

Self-consistency (SC) is a widely used test-time inference technique for improving performance in chain-of-thought reasoning. It involves generating multiple responses, or samples from a large language model (LLM) and selecting the most frequent answer. This procedure can naturally be viewed as a majority vote or empirical mode estimation. Despite its effectiveness, SC is prohibitively expensive at scale when naively applied to datasets, and it lacks a unified theoretical treatment of sample efficiency and scaling behavior. In this paper, we provide the first comprehensive analysis of SC's scaling behavior and its variants, drawing on mode estimation and voting theory. We derive and empirically validate power law scaling for self-consistency across datasets, and analyze the sample efficiency for fixed-allocation and dynamic-allocation sampling schemes. From these insights, we introduce Blend-ASC, a novel variant of self-consistency that dynamically allocates samples to questions during inference, achieving state-of-the-art sample efficiency. Our approach uses 6.8x fewer samples than vanilla SC on average, outperforming both fixed- and dynamic-allocation SC baselines, thereby demonstrating the superiority of our approach in terms of efficiency. In contrast to existing variants, Blend-ASC is hyperparameter-free and can fit an arbitrary sample budget, ensuring it can be easily applied to any self-consistency application.



Almost Linear Time Consistent Mode Estimation and Quick Shift Clustering

Hashemian, Sajjad

arXiv.org Machine Learning

In this paper, we propose a method for density-based clustering in high-dimensional spaces that combines Locality-Sensitive Hashing (LSH) with the Quick Shift algorithm. The Quick Shift algorithm, known for its hierarchical clustering capabilities, is extended by integrating approximate Kernel Density Estimation (KDE) using LSH to provide efficient density estimates. The proposed approach achieves almost linear time complexity while preserving the consistency of density-based clustering.

  Country:
  Genre: Research Report (0.40)

Optimal rates for k-NN density and mode estimation

Sanjoy Dasgupta, Samory Kpotufe

Neural Information Processing Systems

We present two related contributions of independent interest: (1) high-probability finite sample rates for k-NN density estimation, and (2) practical mode estimators - based on k-NN - which attain minimax-optimal rates under surprisingly general distributional conditions.


Mode Estimation for High Dimensional Discrete Tree Graphical Models

Neural Information Processing Systems

This paper studies the following problem: given samples from a high dimensional discrete distribution, we want to estimate the leading (\delta,\rho) -modes of the underlying distributions. A point is defined to be a (\delta,\rho) -mode if it is a local optimum of the density within a \delta -neighborhood under metric \rho . As we increase the scale'' parameter \delta, the neighborhood size increases and the total number of modes monotonically decreases. The sequence of the (\delta,\rho) -modes reveal intrinsic topographical information of the underlying distributions. Though the mode finding problem is generally intractable in high dimensions, this paper unveils that, if the distribution can be approximated well by a tree graphical model, mode characterization is significantly easier.


Optimal rates for k-NN density and mode estimation

Neural Information Processing Systems

We present two related contributions of independent interest: (1) high-probability finite sample rates for k-NN density estimation, and (2) practical mode estimators - based on k-NN - which attain minimax-optimal rates under surprisingly general distributional conditions.


Mode Estimation with Partial Feedback

Arnal, Charles, Cabannes, Vivien, Perchet, Vianney

arXiv.org Machine Learning

The combination of lightly supervised pre-training and online fine-tuning has played a key role in recent AI developments. These new learning pipelines call for new theoretical frameworks. In this paper, we formalize core aspects of weakly supervised and active learning with a simple problem: the estimation of the mode of a distribution using partial feedback. We show how entropy coding allows for optimal information acquisition from partial feedback, develop coarse sufficient statistics for mode identification, and adapt bandit algorithms to our new setting. Finally, we combine those contributions into a statistically and computationally efficient solution to our problem.


Optimal Kernel for Kernel-Based Modal Statistical Methods

Yamasaki, Ryoya, Tanaka, Toshiyuki

arXiv.org Artificial Intelligence

Kernel-based modal statistical methods include mode estimation, regression, and clustering. Estimation accuracy of these methods depends on the kernel used as well as the bandwidth. We study effect of the selection of the kernel function to the estimation accuracy of these methods. In particular, we theoretically show a (multivariate) optimal kernel that minimizes its analytically-obtained asymptotic error criterion when using an optimal bandwidth, among a certain kernel class defined via the number of its sign changes.


Bagged $k$-Distance for Mode-Based Clustering Using the Probability of Localized Level Sets

Hang, Hanyuan

arXiv.org Artificial Intelligence

In this paper, we propose an ensemble learning algorithm named \textit{bagged $k$-distance for mode-based clustering} (\textit{BDMBC}) by putting forward a new measurement called the \textit{probability of localized level sets} (\textit{PLLS}), which enables us to find all clusters for varying densities with a global threshold. On the theoretical side, we show that with a properly chosen number of nearest neighbors $k_D$ in the bagged $k$-distance, the sub-sample size $s$, the bagging rounds $B$, and the number of nearest neighbors $k_L$ for the localized level sets, BDMBC can achieve optimal convergence rates for mode estimation. It turns out that with a relatively small $B$, the sub-sample size $s$ can be much smaller than the number of training data $n$ at each bagging round, and the number of nearest neighbors $k_D$ can be reduced simultaneously. Moreover, we establish optimal convergence results for the level set estimation of the PLLS in terms of Hausdorff distance, which reveals that BDMBC can find localized level sets for varying densities and thus enjoys local adaptivity. On the practical side, we conduct numerical experiments to empirically verify the effectiveness of BDMBC for mode estimation and level set estimation, which demonstrates the promising accuracy and efficiency of our proposed algorithm.


PAC Mode Estimation using PPR Martingale Confidence Sequences

Jain, Shubham Anand, Gupta, Sanit, Mehta, Denil, Nair, Inderjeet Jayakumar, Shah, Rohan, Vora, Jian, Khyalia, Sushil, Das, Sourav, Ribeiro, Vinay J., Kalyanakrishnan, Shivaram

arXiv.org Machine Learning

We consider the problem of correctly identifying the mode of a discrete distribution $\mathcal{P}$ with sufficiently high probability by observing a sequence of i.i.d. samples drawn according to $\mathcal{P}$. This problem reduces to the estimation of a single parameter when $\mathcal{P}$ has a support set of size $K = 2$. Noting the efficiency of prior-posterior-ratio (PPR) martingale confidence sequences for handling this special case, we propose a generalisation to mode estimation, in which $\mathcal{P}$ may take $K \geq 2$ values. We observe that the "one-versus-one" principle yields a more efficient generalisation than the "one-versus-rest" alternative. Our resulting stopping rule, denoted PPR-ME, is optimal in its sample complexity up to a logarithmic factor. Moreover, PPR-ME empirically outperforms several other competing approaches for mode estimation. We demonstrate the gains offered by PPR-ME in two practical applications: (1) sample-based forecasting of the winner in indirect election systems, and (2) efficient verification of smart contracts in permissionless blockchains.